Unbiased probabilistic taxonomic classification for DNA barcoding

نویسندگان

  • Panu Somervuo
  • Sonja Koskela
  • Juho Pennanen
  • R. Henrik Nilsson
  • Otso Ovaskainen
چکیده

MOTIVATION When targeted to a barcoding region, high-throughput sequencing can be used to identify species or operational taxonomical units from environmental samples, and thus to study the diversity and structure of species communities. Although there are many methods which provide confidence scores for assigning taxonomic affiliations, it is not straightforward to translate these values to unbiased probabilities. We present a probabilistic method for taxonomical classification (PROTAX) of DNA sequences. Given a pre-defined taxonomical tree structure that is partially populated by reference sequences, PROTAX decomposes the probability of one to the set of all possible outcomes. PROTAX accounts for species that are present in the taxonomy but that do not have reference sequences, the possibility of unknown taxonomical units, as well as mislabeled reference sequences. PROTAX is based on a statistical multinomial regression model, and it can utilize any kind of sequence similarity measures or the outputs of other classifiers as predictors. RESULTS We demonstrate the performance of PROTAX by using as predictors the output from BLAST, the phylogenetic classification software TIPP, and the RDP classifier. We show that PROTAX improves the predictions of the baseline implementations of TIPP and RDP classifiers, and that it is able to combine complementary information provided by BLAST and TIPP, resulting in accurate and unbiased classifications even with very challenging cases such as 50% mislabeling of reference sequences. AVAILABILITY AND IMPLEMENTATION Perl/R implementation of PROTAX is available at http://www.helsinki.fi/science/metapop/Software.htm CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utility of DNA taxonomy and barcoding for the inference of larval community structure in morphologically cryptic Chironomus (Diptera) species.

Biodiversity studies require species level analyses for the accurate assessment of community structures. However, while specialized taxonomic knowledge is only rarely available for routine identifications, DNA taxonomy and DNA barcoding could provide the taxonomic basis for ecological inferences. In this study, we assessed the community structure of sediment dwelling, morphologically cryptic Ch...

متن کامل

The unholy trinity: taxonomy, species delimitation and DNA barcoding.

Recent excitement over the development of an initiative to generate DNA sequences for all named species on the planet has in our opinion generated two major areas of contention as to how this 'DNA barcoding' initiative should proceed. It is critical that these two issues are clarified and resolved, before the use of DNA as a tool for taxonomy and species delimitation can be universalized. The f...

متن کامل

Rapid dissemination of taxonomic discoveries based on DNA barcoding and morphology

The taxonomic impediment is characterized by dwindling classical taxonomic expertise, and slow pace of revisionary work, thus more rapid taxonomic assessments are needed. Here we pair rapid DNA barcoding methods with swift assessment of morphology in an effort to gauge diversity, establish species limits, and rapidly disseminate taxonomic information prior to completion of formal taxonomic revi...

متن کامل

DNA barcoding will frequently fail in complicated groups: An example in wild potatoes.

DNA barcoding ("barcoding") has been proposed as a rapid and practical molecular method to identify species via diagnostic variation in short orthologous DNA sequences from one or a few universal genomic regions. It seeks to address in a rapid and simple way the "taxonomic impediment" of a greater need for taxonomic identifications than can be supplied by taxonomists. Using a complicated plant ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 32 19  شماره 

صفحات  -

تاریخ انتشار 2016